A new distance measure for non-identical data with application to image classification
نویسندگان
چکیده
Distance measures are part and parcel of many computer vision algorithms. The underlying assumption in all existing distance measures is that feature elements are independent and identically distributed. However, in real-world settings, data generally originate from heterogeneous sources even if they do possess a common data-generating mechanism. Since these sources are not identically distributed by necessity, the assumption of identical distribution is inappropriate. Here, we use statistical analysis to show that feature elements of local image descriptors are indeed non-identically distributed. To test the effect of omitting the unified distribution assumption, we created a new distance measure called the Poisson-Binomial Radius (PBR). PBR is a bin-to-bin distance which accounts for the dispersion of bin-to-bin information. PBR’s performance was evaluated on twelve benchmark data sets covering six different classification and recognition applications: texture, material, leaf, scene, ear biometrics and category-level image classification. Results from these experiments demonstrate that PBR outperforms state-of-the-art distance measures for most of the data sets and achieves comparable performance on the rest, suggesting that accounting for different distributions in distance measures can improve performance in classification and recognition tasks. Keywords— Poisson-Binomial distribution, semi-metric distance, non-identical data, distance measure, image classification, image recognition.
منابع مشابه
Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملImprovement of the Classification of Hyperspectral images by Applying a Novel Method for Estimating Reference Reflectance Spectra
Hyperspectral image containing high spectral information has a large number of narrow spectral bands over a continuous spectral range. This allows the identification and recognition of materials and objects based on the comparison of the spectral reflectance of each of them in different wavelengths. Hence, hyperspectral image in the generation of land cover maps can be very efficient. In the hy...
متن کاملHierarchical Group Compromise Ranking Methodology Based on Euclidean–Hausdorff Distance Measure Under Uncertainty: An Application to Facility Location Selection Problem
Proposing a hierarchical group compromise method can be regarded as a one of major multi-attributes decision-making tool that can be introduced to rank the possible alternatives among conflict criteria. Decision makers’ (DMs’) judgments are considered as imprecise or fuzzy in complex and hesitant situations. In the group decision making, an aggregation of DMs’ judgments and fuzzy group compromi...
متن کاملA new vector valued similarity measure for intuitionistic fuzzy sets based on OWA operators
Plenty of researches have been carried out, focusing on the measures of distance, similarity, and correlation between intuitionistic fuzzy sets (IFSs).However, most of them are single-valued measures and lack of potential for efficiency validation.In this paper, a new vector valued similarity measure for IFSs is proposed based on OWA operators.The vector is defined as a two-tuple consisting of ...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 63 شماره
صفحات -
تاریخ انتشار 2017